Analyses here will be for comparing variation among the species.
We tried importing TPS files into R to perform all analyses, but could not get it to work. SO we instead imported into MorphoJ, performed a procrustes analysis and a multivariate regression on shape (procrustes coordinates) and size (centroid size) to correct for overall body size differences between the species. We import the residuals of this regression below to perform our analyses.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(curl)
## Using libcurl 8.3.0 with Schannel
##
## Attaching package: 'curl'
##
## The following object is masked from 'package:readr':
##
## parse_date
test.df <- curl("https://raw.githubusercontent.com/allisondavis/morphology_analysis/refs/heads/master/MV%20regression%2C%20results.txt")
test.df <- read.table(test.df, header = TRUE, sep = "\t")
We’ll create a PCA to visualize overall shape differences. We’ll then perform Levene’s tests on the PC scores to assess differences in variation among the species.
PCA.test <- prcomp(test.df[, 7:38], center = TRUE, scale. = FALSE) #false uses covariance matrix like in MorphoJ; scale=true only useful if variables are measured on different scales
summary(PCA.test)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 0.02836 0.01986 0.01969 0.01378 0.01248 0.01121 0.009914
## Proportion of Variance 0.30299 0.14861 0.14601 0.07156 0.05867 0.04729 0.037020
## Cumulative Proportion 0.30299 0.45160 0.59760 0.66916 0.72783 0.77512 0.812150
## PC8 PC9 PC10 PC11 PC12 PC13
## Standard deviation 0.008759 0.008336 0.00762 0.00651 0.006144 0.005412
## Proportion of Variance 0.028900 0.026170 0.02187 0.01596 0.014220 0.011030
## Cumulative Proportion 0.841040 0.867210 0.88908 0.90504 0.919260 0.930290
## PC14 PC15 PC16 PC17 PC18 PC19
## Standard deviation 0.005218 0.005002 0.004721 0.004455 0.003892 0.003739
## Proportion of Variance 0.010260 0.009420 0.008390 0.007470 0.005710 0.005270
## Cumulative Proportion 0.940550 0.949980 0.958370 0.965850 0.971550 0.976820
## PC20 PC21 PC22 PC23 PC24 PC25
## Standard deviation 0.00333 0.003231 0.00321 0.002557 0.002468 0.002408
## Proportion of Variance 0.00418 0.003930 0.00388 0.002460 0.002290 0.002180
## Cumulative Proportion 0.98099 0.984930 0.98881 0.991270 0.993560 0.995750
## PC26 PC27 PC28 PC29 PC30 PC31
## Standard deviation 0.002136 0.002034 0.00161 2.896e-16 2.126e-16 2.73e-17
## Proportion of Variance 0.001720 0.001560 0.00098 0.000e+00 0.000e+00 0.00e+00
## Cumulative Proportion 0.997470 0.999020 1.00000 1.000e+00 1.000e+00 1.00e+00
## PC32
## Standard deviation 1.825e-17
## Proportion of Variance 0.000e+00
## Cumulative Proportion 1.000e+00
evplot <- function(ev)
{
# Broken stick model (MacArthur 1957)
n <- length(ev)
bsm <- data.frame(j=seq(1:n), p=0)
bsm$p[1] <- 1/n
for (i in 2:n) bsm$p[i] <- bsm$p[i-1] + (1/(n + 1 - i))
bsm$p <- 100*bsm$p/n
# Plot eigenvalues and % of variation for each axis
op <- par(mfrow=c(2,1))
barplot(ev, main="Eigenvalues", col="bisque", las=2)
abline(h=mean(ev), col="red")
legend("topright", "Average eigenvalue", lwd=1, col=2, bty="n")
barplot(t(cbind(100*ev/sum(ev), bsm$p[n:1])), beside=TRUE,
main="% variation", col=c("bisque",2), las=2)
legend("topright", c("% eigenvalue", "Broken stick model"),
pch=15, col=c("bisque",2), bty="n")
par(op)
}
ev <- PCA.test$sdev^2
evplot(ev) #visually confirms that the first 4 PCs are best to use, not sure if this is a great test though
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
(eig.val.test <- get_eigenvalue(PCA.test))
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 8.044309e-04 3.029912e+01 30.29912
## Dim.2 3.945433e-04 1.486058e+01 45.15971
## Dim.3 3.876418e-04 1.460064e+01 59.76035
## Dim.4 1.899869e-04 7.155909e+00 66.91626
## Dim.5 1.557587e-04 5.866694e+00 72.78295
## Dim.6 1.255665e-04 4.729499e+00 77.51245
## Dim.7 9.829059e-05 3.702143e+00 81.21459
## Dim.8 7.672005e-05 2.889683e+00 84.10427
## Dim.9 6.948059e-05 2.617006e+00 86.72128
## Dim.10 5.806752e-05 2.187130e+00 88.90841
## Dim.11 4.237375e-05 1.596019e+00 90.50443
## Dim.12 3.775063e-05 1.421888e+00 91.92632
## Dim.13 2.928853e-05 1.103161e+00 93.02948
## Dim.14 2.723190e-05 1.025697e+00 94.05518
## Dim.15 2.502220e-05 9.424683e-01 94.99764
## Dim.16 2.228727e-05 8.394564e-01 95.83710
## Dim.17 1.984478e-05 7.474592e-01 96.58456
## Dim.18 1.514869e-05 5.705797e-01 97.15514
## Dim.19 1.398100e-05 5.265985e-01 97.68174
## Dim.20 1.108613e-05 4.175621e-01 98.09930
## Dim.21 1.044038e-05 3.932397e-01 98.49254
## Dim.22 1.030651e-05 3.881977e-01 98.88074
## Dim.23 6.537065e-06 2.462204e-01 99.12696
## Dim.24 6.089258e-06 2.293536e-01 99.35631
## Dim.25 5.797141e-06 2.183509e-01 99.57466
## Dim.26 4.563074e-06 1.718695e-01 99.74653
## Dim.27 4.138116e-06 1.558633e-01 99.90240
## Dim.28 2.591368e-06 9.760463e-02 100.00000
## Dim.29 8.389200e-32 3.159816e-27 100.00000
## Dim.30 4.520650e-32 1.702716e-27 100.00000
## Dim.31 7.455525e-34 2.808145e-29 100.00000
## Dim.32 3.331721e-34 1.254902e-29 100.00000
ind.test <- get_pca_ind(PCA.test)
head(ind.test$coord[,1:4])
## Dim.1 Dim.2 Dim.3 Dim.4
## 1 0.01340669 0.0078857358 0.014029153 -0.0106578735
## 2 0.01342228 -0.0223168798 0.022468552 -0.0050787697
## 3 -0.02748104 -0.0096209523 0.014553590 0.0091966388
## 4 -0.02001009 0.0003470414 0.008558898 0.0037383369
## 5 -0.03116908 0.0111344766 0.013840877 0.0051050439
## 6 -0.01375579 0.0166512862 0.004231201 -0.0008373614
test.df <- cbind(test.df, ind.test$coord[,1:4])
(loadings <- PCA.test$rotation[, 1:5])
## PC1 PC2 PC3 PC4 PC5
## RegResid1 0.135008248 -0.110107210 0.16716573 -0.274849465 0.052275630
## RegResid2 0.155310001 -0.133546894 -0.01466466 0.105705695 -0.190927188
## RegResid3 0.263422372 0.006700256 -0.82407265 0.042422404 0.259518193
## RegResid4 0.148360094 0.156240124 -0.16632659 0.198769756 0.070899948
## RegResid5 -0.670432329 -0.505275356 -0.17727123 0.232038069 -0.095963475
## RegResid6 0.174401325 0.117100951 0.15249628 0.370790715 0.143331415
## RegResid7 0.205954566 0.026521398 0.18463118 0.029099777 0.115944040
## RegResid8 0.029040819 -0.029517028 0.02861223 0.282470957 0.037279697
## RegResid9 -0.079230864 0.183352491 -0.02246982 -0.238692956 0.169871081
## RegResid10 0.100564008 0.002877471 -0.10554812 0.056717141 -0.297849545
## RegResid11 -0.105055841 0.247292942 0.01872475 -0.026959085 -0.001853964
## RegResid12 0.059706631 -0.090181853 -0.12731485 -0.093271461 -0.238773394
## RegResid13 -0.042277217 0.087980640 -0.13963359 -0.340711764 -0.209979248
## RegResid14 0.005249607 -0.188157464 -0.09839363 -0.199721166 -0.055348975
## RegResid15 0.143634501 -0.185964018 0.12708402 0.054968122 0.139445654
## RegResid16 -0.072327608 -0.031638025 0.09710973 -0.025205391 0.369938210
## RegResid17 0.188160260 -0.387448897 0.13142518 -0.018493539 0.196717211
## RegResid18 -0.158585258 0.035307617 0.06818033 -0.078120936 0.345505236
## RegResid19 -0.003795688 -0.134926056 0.06605121 0.099567189 0.097667438
## RegResid20 -0.363086378 0.352140160 -0.05579576 -0.145489784 0.136992213
## RegResid21 -0.145276979 0.213625337 0.09734773 0.425220615 0.077901415
## RegResid22 -0.044727402 0.001069053 0.00684746 -0.188717407 0.008350602
## RegResid23 -0.030163021 0.231249783 0.04663237 -0.079076106 0.016419621
## RegResid24 -0.011025942 -0.098728303 0.06183703 -0.049875094 0.024288123
## RegResid25 0.023045499 0.169496081 0.02669218 0.143853048 -0.326092317
## RegResid26 -0.122714633 0.038725202 0.01041427 -0.024921153 -0.034698824
## RegResid27 -0.038783243 0.209659503 0.01854588 0.157356674 -0.317146993
## RegResid28 -0.119849240 0.035390979 0.06277723 -0.168120638 -0.019423858
## RegResid29 0.074531941 -0.035206773 0.15266214 -0.174676652 -0.067499447
## RegResid30 0.118234295 -0.083421229 0.04286607 0.006064492 -0.110964461
## RegResid31 0.081257793 -0.016950123 0.12648494 -0.031066332 -0.107224840
## RegResid32 0.101449680 -0.083660762 0.03690298 -0.047075726 -0.188599199
sorted.loadings.1 <- loadings[order(loadings[, 1]), 1]
myTitle <- "Loadings Plot for PC1"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.1, main=myTitle, xlab=myXlab, cex=1.5, col="red")
sorted.loadings.2 <- loadings[order(loadings[, 2]), 2]
myTitle <- "Loadings Plot for PC2"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.2, main=myTitle, xlab=myXlab, cex=1.5, col="red")
sorted.loadings.3 <- loadings[order(loadings[, 3]), 3]
myTitle <- "Loadings Plot for PC3"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.3, main=myTitle, xlab=myXlab, cex=1.5, col="red")
sorted.loadings.4 <- loadings[order(loadings[, 4]), 4]
myTitle <- "Loadings Plot for PC4"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.3, main=myTitle, xlab=myXlab, cex=1.5, col="red")
lat.p <- test.df[test.df$Species == "p.latipinna",]
form.p <- test.df[test.df$Species == "p.formosa",]
(pc1 <- t.test(lat.p$Dim.1, form.p$Dim.1, alternative = "two.sided"))
##
## Welch Two Sample t-test
##
## data: lat.p$Dim.1 and form.p$Dim.1
## t = 24.956, df = 303.49, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.03970850 0.04650662
## sample estimates:
## mean of x mean of y
## 0.02497059 -0.01813697
(pc2 <- t.test(lat.p$Dim.2, form.p$Dim.2, alternative = "two.sided"))
##
## Welch Two Sample t-test
##
## data: lat.p$Dim.2 and form.p$Dim.2
## t = -0.48969, df = 277.42, p-value = 0.6247
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.005099263 0.003067681
## sample estimates:
## mean of x mean of y
## 0.001469468 0.002485259
(pc3 <- t.test(lat.p$Dim.3, form.p$Dim.3, alternative = "two.sided"))
##
## Welch Two Sample t-test
##
## data: lat.p$Dim.3 and form.p$Dim.3
## t = 4.4683, df = 303.32, p-value = 1.115e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.005281476 0.013594355
## sample estimates:
## mean of x mean of y
## 0.005540339 -0.003897577
(pc4 <- t.test(lat.p$Dim.3, form.p$Dim.4, alternative = "two.sided"))
##
## Welch Two Sample t-test
##
## data: lat.p$Dim.3 and form.p$Dim.4
## t = 2.2072, df = 313.69, p-value = 0.02802
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.0004260988 0.0074223264
## sample estimates:
## mean of x mean of y
## 0.005540339 0.001616126
(pc1V <- leveneTest(Dim.1~Species, data=test.df))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.3452 0.7084
## 337
(pc2V <- leveneTest(Dim.2~Species, data=test.df))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 7.5716 0.0006074 ***
## 337
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(pc3V <- leveneTest(Dim.3~Species, data=test.df))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 4.3582 0.01353 *
## 337
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(pc4V <- leveneTest(Dim.4~Species, data=test.df))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 4.6175 0.01051 *
## 337
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
library(dplyr)
test.df %>%
group_by(Species) %>%
summarise(sd_dim1=sd(Dim.1, na.rm = TRUE),
sd_dim2=sd(Dim.2, na.rm = TRUE),
sd_dim3=sd(Dim.3, na.rm = TRUE),
sd_dim4=sd(Dim.4, na.rm = TRUE))
## # A tibble: 3 × 5
## Species sd_dim1 sd_dim2 sd_dim3 sd_dim4
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 p.formosa 0.0140 0.0150 0.0212 0.0154
## 2 p.latipinna 0.0166 0.0213 0.0163 0.0120
## 3 p.mexicana 0.0158 0.0175 0.0195 0.0100
test.df %>%
group_by(Species) %>%
summarise(var_dim1=var(Dim.1, na.rm = TRUE),
var_dim2=var(Dim.2, na.rm = TRUE),
var_dim3=var(Dim.3, na.rm = TRUE),
var_dim4=var(Dim.4, na.rm = TRUE))
## # A tibble: 3 × 5
## Species var_dim1 var_dim2 var_dim3 var_dim4
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 p.formosa 0.000197 0.000226 0.000449 0.000237
## 2 p.latipinna 0.000277 0.000455 0.000266 0.000145
## 3 p.mexicana 0.000249 0.000306 0.000382 0.0000999
library(AMR)
library(ggplot2)
library(ggfortify)
library(ggbiplot)
##
## Attaching package: 'ggbiplot'
## The following object is masked from 'package:ggfortify':
##
## ggbiplot
plot1<- autoplot(PCA.test, data = test.df, colour="Species", loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot1
plot2<- autoplot(PCA.test, x=2, y=3, data = test.df, colour='Species', loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot2
plot3<- autoplot(PCA.test, x=3, y=4, data = test.df, colour='Species', loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot3
Here, we’ll separate the amazon and sailfin (not the atlantic, since they’re all from the same location) and see if they vary by zone or basin.
lat.df <- lat.p[, 1:38]
PCA.lat <- prcomp(lat.df[, 7:38], center = TRUE, scale. = FALSE) #false uses covariance matrix like in MorphoJ; scale=true only useful if variables are measured on different scales
summary(PCA.lat)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 0.02256 0.0187 0.01486 0.01233 0.01193 0.01139 0.009291
## Proportion of Variance 0.24630 0.1693 0.10690 0.07356 0.06885 0.06274 0.041780
## Cumulative Proportion 0.24630 0.4156 0.52250 0.59607 0.66491 0.72765 0.769420
## PC8 PC9 PC10 PC11 PC12 PC13
## Standard deviation 0.008792 0.008266 0.007412 0.006872 0.006553 0.005442
## Proportion of Variance 0.037400 0.033060 0.026590 0.022850 0.020780 0.014330
## Cumulative Proportion 0.806830 0.839890 0.866480 0.889330 0.910100 0.924430
## PC14 PC15 PC16 PC17 PC18 PC19
## Standard deviation 0.004901 0.004773 0.004131 0.00394 0.003564 0.003423
## Proportion of Variance 0.011630 0.011020 0.008260 0.00751 0.006150 0.005670
## Cumulative Proportion 0.936060 0.947080 0.955340 0.96285 0.969000 0.974670
## PC20 PC21 PC22 PC23 PC24 PC25
## Standard deviation 0.003141 0.003086 0.002761 0.00244 0.002291 0.002081
## Proportion of Variance 0.004780 0.004610 0.003690 0.00288 0.002540 0.002090
## Cumulative Proportion 0.979450 0.984060 0.987740 0.99062 0.993160 0.995260
## PC26 PC27 PC28 PC29 PC30 PC31
## Standard deviation 0.001913 0.001888 0.001604 2.602e-16 2.053e-16 2.351e-17
## Proportion of Variance 0.001770 0.001720 0.001250 0.000e+00 0.000e+00 0.000e+00
## Cumulative Proportion 0.997030 0.998750 1.000000 1.000e+00 1.000e+00 1.000e+00
## PC32
## Standard deviation 2.033e-17
## Proportion of Variance 0.000e+00
## Cumulative Proportion 1.000e+00
ev.lat <- PCA.lat$sdev^2
evplot(ev.lat) #visually confirms that the first 4 PCs are best to use, not sure if this is a great test though
(eig.val.lat <- get_eigenvalue(PCA.lat))
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 5.089691e-04 2.463037e+01 24.63037
## Dim.2 3.498444e-04 1.692990e+01 41.56028
## Dim.3 2.209011e-04 1.068999e+01 52.25027
## Dim.4 1.520141e-04 7.356367e+00 59.60664
## Dim.5 1.422644e-04 6.884553e+00 66.49119
## Dim.6 1.296384e-04 6.273549e+00 72.76474
## Dim.7 8.632518e-05 4.177506e+00 76.94225
## Dim.8 7.729411e-05 3.740468e+00 80.68271
## Dim.9 6.831973e-05 3.306174e+00 83.98889
## Dim.10 5.493942e-05 2.658665e+00 86.64755
## Dim.11 4.721778e-05 2.284994e+00 88.93255
## Dim.12 4.293570e-05 2.077773e+00 91.01032
## Dim.13 2.961398e-05 1.433099e+00 92.44342
## Dim.14 2.402383e-05 1.162577e+00 93.60600
## Dim.15 2.277746e-05 1.102262e+00 94.70826
## Dim.16 1.706433e-05 8.257886e-01 95.53405
## Dim.17 1.552728e-05 7.514063e-01 96.28545
## Dim.18 1.270177e-05 6.146725e-01 96.90012
## Dim.19 1.172010e-05 5.671669e-01 97.46729
## Dim.20 9.867205e-06 4.775004e-01 97.94479
## Dim.21 9.520493e-06 4.607220e-01 98.40551
## Dim.22 7.622502e-06 3.688732e-01 98.77439
## Dim.23 5.951432e-06 2.880057e-01 99.06239
## Dim.24 5.250389e-06 2.540803e-01 99.31647
## Dim.25 4.329161e-06 2.094996e-01 99.52597
## Dim.26 3.658401e-06 1.770398e-01 99.70301
## Dim.27 3.563890e-06 1.724661e-01 99.87548
## Dim.28 2.573140e-06 1.245211e-01 100.00000
## Dim.29 6.771733e-32 3.277022e-27 100.00000
## Dim.30 4.216376e-32 2.040417e-27 100.00000
## Dim.31 5.529280e-34 2.675766e-29 100.00000
## Dim.32 4.133666e-34 2.000391e-29 100.00000
ind.lat <- get_pca_ind(PCA.lat)
head(ind.lat$coord[,1:4])
## Dim.1 Dim.2 Dim.3 Dim.4
## 1 -0.013075839 -0.005953445 -0.014516737 0.009782756
## 2 0.013522410 -0.030341882 0.008616451 -0.003137634
## 17 -0.013927232 -0.008037190 0.009111129 -0.015665363
## 18 0.007245554 -0.027191078 0.007520433 -0.001845440
## 19 0.015676977 -0.016972623 -0.005553874 -0.026107405
## 23 -0.047595164 0.009972702 -0.024688215 -0.006569138
lat.df <- cbind(lat.df, ind.lat$coord[,1:4])
(loadings <- PCA.lat$rotation[, 1:5])
## PC1 PC2 PC3 PC4 PC5
## RegResid1 0.14547815 -0.051744212 0.016576621 -0.233346684 0.1354109128
## RegResid2 0.20511663 0.061326256 -0.092505451 0.096353152 0.2423355481
## RegResid3 0.18114602 0.658057873 0.399151049 0.039582838 -0.0958460295
## RegResid4 -0.06433836 0.214443471 -0.030276601 0.134196207 0.0867121424
## RegResid5 0.18003079 -0.359567339 0.390241875 0.552066753 -0.0189325511
## RegResid6 -0.03276204 -0.006891521 -0.300655521 -0.048761948 0.1574077800
## RegResid7 0.05739843 -0.029411891 -0.152611613 0.157458215 0.0987670533
## RegResid8 0.05636110 -0.009096604 -0.187059095 -0.128419963 0.0554541502
## RegResid9 -0.20966909 0.048090793 0.162978900 -0.337758137 0.0293593836
## RegResid10 0.07417347 0.216289282 -0.134709855 0.194656356 -0.0005791389
## RegResid11 -0.22953991 0.098644984 0.070234666 -0.084786854 0.2257186682
## RegResid12 0.12628795 0.177781772 0.020819533 0.133828507 0.0160621865
## RegResid13 -0.11619629 0.146952039 0.133191040 -0.082931189 -0.1857759922
## RegResid14 0.17611451 0.040999308 0.189237952 0.062890710 0.0301976768
## RegResid15 0.23980921 -0.145337687 -0.093736014 0.013053314 0.0114438328
## RegResid16 -0.03850511 -0.177226939 0.175514324 -0.225610667 0.0392127216
## RegResid17 0.40390913 -0.266299517 0.019912683 -0.079736988 -0.1102195397
## RegResid18 -0.13234224 -0.207949317 0.211760981 -0.192743219 0.0619430736
## RegResid19 0.06990651 -0.203098560 -0.070760984 -0.095082185 -0.2255307600
## RegResid20 -0.48271465 -0.159793422 0.286232887 0.197319880 -0.0839264257
## RegResid21 -0.21669943 -0.088906110 -0.062970382 0.178953806 0.5870064642
## RegResid22 0.01811216 0.013910250 0.063269208 -0.126078855 -0.0373247155
## RegResid23 -0.19823831 0.050960743 0.008991280 -0.055219021 0.0648427041
## RegResid24 0.11516054 -0.032864009 0.022084339 -0.101618237 0.2194744921
## RegResid25 -0.18593950 0.062824001 -0.310255432 0.161533731 -0.2181740626
## RegResid26 -0.10045570 -0.038730195 -0.004836385 0.045605245 -0.3107887763
## RegResid27 -0.20280977 0.052926576 -0.331437023 0.253239790 -0.1945615777
## RegResid28 -0.11130317 -0.126252520 -0.003469612 -0.073753439 -0.3031864638
## RegResid29 0.04711640 0.012048750 -0.074743569 -0.228717633 -0.0342755588
## RegResid30 0.09621868 0.026804331 -0.099662110 0.036845007 -0.0091344384
## RegResid31 0.03429768 0.013859557 -0.104763098 -0.158309756 -0.0692329475
## RegResid32 0.09487621 0.007249859 -0.115744595 -0.004708735 -0.1638598127
sorted.loadings.1 <- loadings[order(loadings[, 1]), 1]
myTitle <- "Loadings Plot for PC1"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.1, main=myTitle, xlab=myXlab, cex=1.5, col="red")
sorted.loadings.2 <- loadings[order(loadings[, 2]), 2]
myTitle <- "Loadings Plot for PC2"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.2, main=myTitle, xlab=myXlab, cex=1.5, col="red")
sorted.loadings.3 <- loadings[order(loadings[, 3]), 3]
myTitle <- "Loadings Plot for PC3"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.3, main=myTitle, xlab=myXlab, cex=1.5, col="red")
sorted.loadings.4 <- loadings[order(loadings[, 4]), 4]
myTitle <- "Loadings Plot for PC4"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.3, main=myTitle, xlab=myXlab, cex=1.5, col="red")
lat.df$Basin <- as.factor(lat.df$Basin)
### ZONE
(pc1V <- leveneTest(Dim.1~Zone, data=lat.df))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 4.8261 0.003081 **
## 152
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(pc2V <- leveneTest(Dim.2~Zone, data=lat.df))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.6926 0.5579
## 152
(pc3V <- leveneTest(Dim.3~Zone, data=lat.df))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 2.5995 0.05431 .
## 152
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(pc4V <- leveneTest(Dim.4~Zone, data=lat.df))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 3.4078 0.0192 *
## 152
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lat.df %>%
group_by(Zone) %>%
summarise(sd_dim1=sd(Dim.1, na.rm = TRUE),
sd_dim2=sd(Dim.2, na.rm = TRUE),
sd_dim3=sd(Dim.3, na.rm = TRUE),
sd_dim4=sd(Dim.4, na.rm = TRUE))
## # A tibble: 4 × 5
## Zone sd_dim1 sd_dim2 sd_dim3 sd_dim4
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Z1 0.0192 0.0169 0.0134 0.0109
## 2 Z2 0.0265 0.0156 0.0127 0.00926
## 3 Z3 0.0124 0.0210 0.0193 0.0157
## 4 Z4 0.0189 0.0185 0.0118 0.00946
lat.df %>%
group_by(Zone) %>%
summarise(var_dim1=var(Dim.1, na.rm = TRUE),
var_dim2=var(Dim.2, na.rm = TRUE),
var_dim3=var(Dim.3, na.rm = TRUE),
var_dim4=var(Dim.4, na.rm = TRUE))
## # A tibble: 4 × 5
## Zone var_dim1 var_dim2 var_dim3 var_dim4
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Z1 0.000367 0.000286 0.000180 0.000118
## 2 Z2 0.000701 0.000243 0.000162 0.0000858
## 3 Z3 0.000154 0.000441 0.000374 0.000248
## 4 Z4 0.000357 0.000343 0.000139 0.0000894
### BASIN
(pc1V <- leveneTest(Dim.1~Basin, data=lat.df))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 4 2.3261 0.05893 .
## 151
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(pc2V <- leveneTest(Dim.2~Basin, data=lat.df))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 4 1.6178 0.1726
## 151
(pc3V <- leveneTest(Dim.3~Basin, data=lat.df))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 4 0.9236 0.4519
## 151
(pc4V <- leveneTest(Dim.4~Basin, data=lat.df))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 4 0.4782 0.7517
## 151
lat.df %>%
group_by(Basin) %>%
summarise(sd_dim1=sd(Dim.1, na.rm = TRUE),
sd_dim2=sd(Dim.2, na.rm = TRUE),
sd_dim3=sd(Dim.3, na.rm = TRUE),
sd_dim4=sd(Dim.4, na.rm = TRUE))
## # A tibble: 5 × 5
## Basin sd_dim1 sd_dim2 sd_dim3 sd_dim4
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 7040048310 0.0138 0.0210 0.0115 0.0115
## 2 7040048420 0.0225 0.0186 0.0157 0.0108
## 3 7040049270 0.0153 0.00958 0.00805 0.0122
## 4 7040049280 0.0244 0.0146 0.00940 0.0116
## 5 7040049990 0.0170 0.0120 0.0146 0.00910
lat.df %>%
group_by(Basin) %>%
summarise(var_dim1=var(Dim.1, na.rm = TRUE),
var_dim2=var(Dim.2, na.rm = TRUE),
var_dim3=var(Dim.3, na.rm = TRUE),
var_dim4=var(Dim.4, na.rm = TRUE))
## # A tibble: 5 × 5
## Basin var_dim1 var_dim2 var_dim3 var_dim4
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 7040048310 0.000190 0.000442 0.000133 0.000133
## 2 7040048420 0.000507 0.000345 0.000247 0.000117
## 3 7040049270 0.000233 0.0000919 0.0000648 0.000148
## 4 7040049280 0.000597 0.000214 0.0000884 0.000134
## 5 7040049990 0.000289 0.000143 0.000213 0.0000828
plot1.lat<- autoplot(PCA.lat, data = lat.df, colour="Zone", loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot1.lat
plot2.lat<- autoplot(PCA.lat, x=2, y=3, data = lat.df, colour='Zone', loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot2.lat
plot3.lat<- autoplot(PCA.lat, x=3, y=4, data = lat.df, colour='Zone', loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot3.lat
plot4.lat<- autoplot(PCA.lat, data = lat.df, colour="Basin", loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012", "#083CA0"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012", "#083CA0"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot4.lat
## Too few points to calculate an ellipse
plot5.lat<- autoplot(PCA.lat, x=2, y=3, data = lat.df, colour='Basin', loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012", "#083CA0"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012", "#083CA0"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot5.lat
## Too few points to calculate an ellipse
plot6.lat<- autoplot(PCA.lat, x=3, y=4, data = lat.df, colour='Basin', loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012", "#083CA0"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012", "#083CA0"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot6.lat
## Too few points to calculate an ellipse
form.df <- form.p[, 1:38]
PCA.form <- prcomp(form.df[, 7:38], center = TRUE, scale. = FALSE) #false uses covariance matrix like in MorphoJ; scale=true only useful if variables are measured on different scales
summary(PCA.form)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 0.02344 0.01664 0.01628 0.01277 0.01023 0.009619
## Proportion of Variance 0.28413 0.14316 0.13693 0.08425 0.05410 0.047830
## Cumulative Proportion 0.28413 0.42729 0.56422 0.64847 0.70258 0.750400
## PC7 PC8 PC9 PC10 PC11 PC12
## Standard deviation 0.008828 0.007856 0.007764 0.006125 0.005853 0.005485
## Proportion of Variance 0.040290 0.031900 0.031160 0.019390 0.017710 0.015550
## Cumulative Proportion 0.790690 0.822590 0.853760 0.873150 0.890860 0.906410
## PC13 PC14 PC15 PC16 PC17 PC18
## Standard deviation 0.00510 0.004821 0.00451 0.004428 0.004222 0.003803
## Proportion of Variance 0.01344 0.012020 0.01051 0.010140 0.009220 0.007480
## Cumulative Proportion 0.91985 0.931870 0.94238 0.952520 0.961740 0.969210
## PC19 PC20 PC21 PC22 PC23 PC24
## Standard deviation 0.003697 0.003061 0.002721 0.002569 0.00229 0.002225
## Proportion of Variance 0.007060 0.004840 0.003830 0.003410 0.00271 0.002560
## Cumulative Proportion 0.976280 0.981120 0.984950 0.988360 0.99107 0.993630
## PC25 PC26 PC27 PC28 PC29 PC30
## Standard deviation 0.002179 0.001857 0.001501 0.001368 2.956e-16 2.003e-16
## Proportion of Variance 0.002450 0.001780 0.001160 0.000970 0.000e+00 0.000e+00
## Cumulative Proportion 0.996090 0.997870 0.999030 1.000000 1.000e+00 1.000e+00
## PC31 PC32
## Standard deviation 3.359e-17 1.565e-17
## Proportion of Variance 0.000e+00 0.000e+00
## Cumulative Proportion 1.000e+00 1.000e+00
ev.form <- PCA.form$sdev^2
evplot(ev.form) #visually confirms that the first 4 PCs are best to use, not sure if this is a great test though
(eig.val.form <- get_eigenvalue(PCA.form))
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 5.496561e-04 2.841328e+01 28.41328
## Dim.2 2.769383e-04 1.431573e+01 42.72901
## Dim.3 2.648949e-04 1.369317e+01 56.42218
## Dim.4 1.629873e-04 8.425276e+00 64.84746
## Dim.5 1.046576e-04 5.410050e+00 70.25751
## Dim.6 9.252281e-05 4.782767e+00 75.04028
## Dim.7 7.794052e-05 4.028967e+00 79.06924
## Dim.8 6.171449e-05 3.190197e+00 82.25944
## Dim.9 6.028164e-05 3.116129e+00 85.37557
## Dim.10 3.751212e-05 1.939108e+00 87.31468
## Dim.11 3.426278e-05 1.771140e+00 89.08582
## Dim.12 3.008127e-05 1.554986e+00 90.64080
## Dim.13 2.600863e-05 1.344460e+00 91.98526
## Dim.14 2.324619e-05 1.201662e+00 93.18693
## Dim.15 2.034107e-05 1.051487e+00 94.23841
## Dim.16 1.961158e-05 1.013778e+00 95.25219
## Dim.17 1.782749e-05 9.215537e-01 96.17374
## Dim.18 1.446328e-05 7.476482e-01 96.92139
## Dim.19 1.366637e-05 7.064535e-01 97.62785
## Dim.20 9.370883e-06 4.844076e-01 98.11225
## Dim.21 7.403114e-06 3.826880e-01 98.49494
## Dim.22 6.599851e-06 3.411650e-01 98.83611
## Dim.23 5.242350e-06 2.709920e-01 99.10710
## Dim.24 4.951417e-06 2.559528e-01 99.36305
## Dim.25 4.748539e-06 2.454655e-01 99.60852
## Dim.26 3.448335e-06 1.782542e-01 99.78677
## Dim.27 2.252950e-06 1.164614e-01 99.90323
## Dim.28 1.871958e-06 9.676684e-02 100.00000
## Dim.29 8.735415e-32 4.515584e-27 100.00000
## Dim.30 4.011564e-32 2.073692e-27 100.00000
## Dim.31 1.128436e-33 5.833205e-29 100.00000
## Dim.32 2.450323e-34 1.266642e-29 100.00000
ind.form <- get_pca_ind(PCA.form)
head(ind.form$coord[,1:4])
## Dim.1 Dim.2 Dim.3 Dim.4
## 3 0.018729631 0.008267946 0.012334104 9.235173e-03
## 4 0.011495712 0.005327166 0.006219287 -1.324276e-03
## 5 0.022341845 0.004254276 -0.007065372 -5.780291e-05
## 6 0.006758509 0.004209099 -0.008024476 -1.883621e-02
## 7 0.017476588 -0.006645061 0.004701094 5.995360e-03
## 8 0.023840809 0.004964162 -0.009996163 5.826681e-03
form.df <- cbind(form.df, ind.form$coord[,1:4])
(loadings <- PCA.form$rotation[, 1:5])
## PC1 PC2 PC3 PC4 PC5
## RegResid1 0.121588733 -0.139305170 0.138920450 -0.05163370 0.245333226
## RegResid2 -0.108107578 0.179832327 0.075377629 -0.26577431 0.156693183
## RegResid3 -0.836480823 -0.105703585 -0.093072978 0.26020641 0.133147995
## RegResid4 -0.228334154 0.152682158 -0.172055756 0.05743047 -0.018144789
## RegResid5 0.149854940 -0.041501749 0.249418799 0.28171737 0.077007928
## RegResid6 0.029255338 0.429880879 -0.141806708 0.15070618 -0.082240527
## RegResid7 0.075989886 0.083213189 0.080810430 0.16651710 -0.126481626
## RegResid8 -0.014367607 0.308450738 -0.047995763 0.07466934 -0.067469746
## RegResid9 0.040417717 -0.236003132 -0.187038250 0.09661896 0.037058026
## RegResid10 -0.140058602 0.083562024 0.029836359 -0.26537510 0.017225049
## RegResid11 0.132179005 -0.048405402 -0.282702887 -0.17680929 0.467172079
## RegResid12 -0.111458062 -0.090716568 0.151369019 -0.22422397 0.088322612
## RegResid13 -0.092367191 -0.334042418 0.001850428 -0.26548385 -0.129823509
## RegResid14 -0.059187440 -0.188461835 0.187694302 -0.03858674 0.198680179
## RegResid15 0.023806456 0.106647860 0.178310533 0.15666552 -0.157009984
## RegResid16 0.138826887 -0.008569834 -0.041765252 0.33645857 0.130825324
## RegResid17 -0.010982799 0.097762911 0.374866711 0.20551831 -0.009191890
## RegResid18 0.138233751 -0.143903945 -0.080778416 0.32177526 0.031075399
## RegResid19 -0.004516778 0.096253356 0.110942600 0.11088303 -0.154328528
## RegResid20 0.082999909 -0.359259162 -0.347261423 0.14054350 -0.222574282
## RegResid21 0.137313480 0.317676924 -0.426012971 -0.02384011 0.241956426
## RegResid22 0.049355605 -0.179983173 0.003784677 -0.03005603 0.025380873
## RegResid23 0.069117558 -0.056307407 -0.284897056 -0.11284654 -0.136608037
## RegResid24 0.081440441 -0.031979320 0.093315630 0.01220546 0.203220078
## RegResid25 -0.035965535 0.126836781 -0.021527459 -0.22312116 -0.331228790
## RegResid26 0.080969384 -0.049203257 -0.023211583 0.02095752 -0.120723996
## RegResid27 -0.013205676 0.060008452 -0.096789782 -0.18017103 -0.338807532
## RegResid28 0.102886487 -0.208350253 0.022238265 -0.01207600 -0.293067798
## RegResid29 0.148049878 -0.044255174 0.137577299 -0.10743769 0.093104415
## RegResid30 -0.016318306 0.068197246 0.132471679 -0.08828902 0.007796356
## RegResid31 0.095201149 0.117124563 0.119344132 -0.13678334 0.088699801
## RegResid32 -0.026136054 0.037821976 0.158787340 -0.19036512 -0.054997915
sorted.loadings.1 <- loadings[order(loadings[, 1]), 1]
myTitle <- "Loadings Plot for PC1"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.1, main=myTitle, xlab=myXlab, cex=1.5, col="red")
sorted.loadings.2 <- loadings[order(loadings[, 2]), 2]
myTitle <- "Loadings Plot for PC2"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.2, main=myTitle, xlab=myXlab, cex=1.5, col="red")
sorted.loadings.3 <- loadings[order(loadings[, 3]), 3]
myTitle <- "Loadings Plot for PC3"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.3, main=myTitle, xlab=myXlab, cex=1.5, col="red")
sorted.loadings.4 <- loadings[order(loadings[, 4]), 4]
myTitle <- "Loadings Plot for PC4"
myXlab <- "Variable Loadings"
dotchart(sorted.loadings.3, main=myTitle, xlab=myXlab, cex=1.5, col="red")
form.df$Basin <- as.factor(form.df$Basin)
### ZONE
(pc1V <- leveneTest(Dim.1~Zone, data=form.df))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 2.7842 0.0427 *
## 159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(pc2V <- leveneTest(Dim.2~Zone, data=form.df))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 6.107 0.0005864 ***
## 159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(pc3V <- leveneTest(Dim.3~Zone, data=form.df))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 2.6423 0.05124 .
## 159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(pc4V <- leveneTest(Dim.4~Zone, data=form.df))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 1.4063 0.243
## 159
form.df %>%
group_by(Zone) %>%
summarise(sd_dim1=sd(Dim.1, na.rm = TRUE),
sd_dim2=sd(Dim.2, na.rm = TRUE),
sd_dim3=sd(Dim.3, na.rm = TRUE),
sd_dim4=sd(Dim.4, na.rm = TRUE))
## # A tibble: 4 × 5
## Zone sd_dim1 sd_dim2 sd_dim3 sd_dim4
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Z1 0.0200 0.0204 0.0153 0.0114
## 2 Z2 0.0273 0.00951 0.00961 0.0141
## 3 Z3 0.0256 0.0123 0.0134 0.0116
## 4 Z4 0.0124 0.0104 0.0126 0.0101
form.df %>%
group_by(Zone) %>%
summarise(var_dim1=var(Dim.1, na.rm = TRUE),
var_dim2=var(Dim.2, na.rm = TRUE),
var_dim3=var(Dim.3, na.rm = TRUE),
var_dim4=var(Dim.4, na.rm = TRUE))
## # A tibble: 4 × 5
## Zone var_dim1 var_dim2 var_dim3 var_dim4
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Z1 0.000401 0.000414 0.000234 0.000131
## 2 Z2 0.000744 0.0000904 0.0000924 0.000200
## 3 Z3 0.000655 0.000151 0.000180 0.000133
## 4 Z4 0.000153 0.000109 0.000158 0.000101
### BASIN
(pc1V <- leveneTest(Dim.1~Basin, data=form.df))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 2.8779 0.03785 *
## 159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(pc2V <- leveneTest(Dim.2~Basin, data=form.df))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.8159 0.4869
## 159
(pc3V <- leveneTest(Dim.3~Basin, data=form.df))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.777 0.5084
## 159
(pc4V <- leveneTest(Dim.4~Basin, data=form.df))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 2.658 0.05022 .
## 159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
form.df %>%
group_by(Basin) %>%
summarise(sd_dim1=sd(Dim.1, na.rm = TRUE),
sd_dim2=sd(Dim.2, na.rm = TRUE),
sd_dim3=sd(Dim.3, na.rm = TRUE),
sd_dim4=sd(Dim.4, na.rm = TRUE))
## # A tibble: 4 × 5
## Basin sd_dim1 sd_dim2 sd_dim3 sd_dim4
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 7040048420 0.0261 0.0112 0.0128 0.0126
## 2 7040049270 0.0180 0.0106 0.0132 0.0105
## 3 7040049280 0.0152 0.00889 0.0131 0.0145
## 4 7040049990 0.0179 0.0120 0.00842 0.00616
form.df %>%
group_by(Basin) %>%
summarise(var_dim1=var(Dim.1, na.rm = TRUE),
var_dim2=var(Dim.2, na.rm = TRUE),
var_dim3=var(Dim.3, na.rm = TRUE),
var_dim4=var(Dim.4, na.rm = TRUE))
## # A tibble: 4 × 5
## Basin var_dim1 var_dim2 var_dim3 var_dim4
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 7040048420 0.000680 0.000125 0.000163 0.000160
## 2 7040049270 0.000325 0.000112 0.000174 0.000110
## 3 7040049280 0.000232 0.0000791 0.000172 0.000211
## 4 7040049990 0.000321 0.000144 0.0000709 0.0000379
plot1.form<- autoplot(PCA.form, data = form.df, colour="Zone", loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot1.form
plot2.form<- autoplot(PCA.form, x=2, y=3, data = form.df, colour='Zone', loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot2.form
plot3.form<- autoplot(PCA.form, x=3, y=4, data = form.df, colour='Zone', loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot3.form
plot4.form<- autoplot(PCA.form, data = form.df, colour="Basin", loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot4.form
plot5.form<- autoplot(PCA.form, x=2, y=3, data = form.df, colour='Basin', loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot5.form
plot6.form<- autoplot(PCA.form, x=3, y=4, data = form.df, colour='Basin', loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#9986A5", "#902012"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot6.form
As expected, we see that Amazons are intermediate to both parents in overall shape. This is most evident in PC1, but in PC 2-4, there is a lot of overlap between the species. This is likely due to PC 2-4 explaining much less variation, so there is less that distinguishes the groups.
When we perform our Levene’s test, we see that there is no significant difference in variation among the species for PC1, but there is a signifiant difference for PC2 (p=0.0006) and PC3 (p=0.0135). Landmarks 3, 9, and 10 all load heavily on PC2, which correspond to dorsal position and body depth. We see that sailfins vary more than amazons and atlantics for PC2. For PC3, it is only landmark 2 that loads heavily (all others pretty similar), which could correspond to dorsal position and head depth. Here we see that amazons are more variable than the two parental species. For PC4, we see landmarks 3 and 11 loading heavily, which again can correspond to dorsal position and head depth, with Amazons again more variable than parental species.
It is good to remember though, that while PC 2-4 shows diffrences in variation, and interesting results regarding amazons varying more than parentals, these PCs explain much less variation than PC1, which shows no significant difference.
When looking at geographic differences in variation, we see that variation among zones is significant for PC1 (p=0.003) and PC4 (0.019) in sailfin, suggestive for PC3 (p=0.054) and not significant for PC2 (0.558). For PC1 we see that zone 2 is most variable, whereas for PC4 we see zone 3 as most variable. Zone 3 is also slightly more variable in PC3. This result is NOT held up when we look at basin. There is only a suggestive difference in variation of PC1 when using basin as our geographic factor (p=0.059), but no significant differences for PC2-4.
For amazons, we see that PC1 (p=0.043) and PC2 (p=0.0006) show significant differences in variation among zones, with a suggestive difference for PC3 (p=0.051) and no significant difference for PC4 (0.243). PC1 shows zone 2 as most variable, whereas PC2 shows zone 1 as most variable. Zone 1 is also slightly more variable for PC3. These results are only partially held up when using basin as our geographic factor. PC1 shows a significant difference (p=0.038) and PC4 shows a suggestive difference (p=0.050), but PC2 and PC3 show no significant difference in variation. For PC1, basin 7040048420 is most variable, whereas basin 7040049280 is most variable for PC4.
While basin produced less differences in variation within each species, it did create better grouping than zones. This just hits on the fact that basin is probably a better geographic classifier than zone (since our zones were wildly different in lat/area).
Analyses here will strictly be for comparing digital vs hand measurements.
To get digital distances: The TPS coordinates were in a long format, and I thought it would be easiest to locate the values needed to insert into the euclidean distance formula with a wide format. So, I first cleaned up the original TPS file–I removed all curve data, and unnecessary rows (ie. LM=16, image=) to be left with a column of IDs, a column of landmark labels, and two columns for the coordinates.
Importing that into R, I then used the wider fxn to create one column for IDs, 16 columns for the X coordinates of all the landmarks, and 16 columns for all the y coordinates of all the landmarks (originally wanted it to be a coordinates column, so there would be two rows for each specimen to minimize column numbers, but whatever, this works). I separated this into two dataframes, one with just the X coords and one with just the Y, that way it would be easier to call.
I calculated the euclidean distance (formula: sqrt(sq(X\(LM2-X\)LM1)+sq(Y\(LM2-Y\)LM1))) for most of the same measurements (only missing total length, and head width, as this was not possible in the digital photos). For head depth, wasn’t sure if 2-> 11 or 2->12 was closer to what I measured by hand; same for body depth, wasn’t sure if it was 3->10 or 3->11. I included both.
finished all of the calculations and added the values into a dataframe with the IDs from the original tps file. Exported it to excel
Once in excel, I realized that the values I had were in pixels rather than mm, so I needed to find a way to convert in order to compare to the hand measurements. I found out that the scale factor (included in the original tps file) is cm/pixel, so I added this to the excel, and multiplied it to each value to get the cm, then by 10 to get mm.
Below is for reference only. For details on how I formated, see original geomorph rmd.
HL.dig <- sqrt(((x\(X.COR_LM2 -x\)X.COR_LM1)^2 + (y\(Y.COR_LM2-y\)Y.COR_LM1)^2)) SL.dig <- sqrt(((x\(X.COR_LM6 -x\)X.COR_LM1)^2 + (y\(Y.COR_LM6-y\)Y.COR_LM1)^2)) PreDL.dig <- sqrt(((x\(X.COR_LM3 -x\)X.COR_LM1)^2 + (y\(Y.COR_LM3-y\)Y.COR_LM1)^2)) DbL.dig <- sqrt(((x\(X.COR_LM4 -x\)X.COR_LM3)^2 + (y\(Y.COR_LM4-y\)Y.COR_LM3)^2)) CPD.dig <- sqrt(((x\(X.COR_LM7 -x\)X.COR_LM5)^2 + (y\(Y.COR_LM7-y\)Y.COR_LM5)^2)) HD1.dig <- sqrt(((x\(X.COR_LM11 -x\)X.COR_LM2)^2 + (y\(Y.COR_LM11-y\)Y.COR_LM2)^2)) HD2.dig <- sqrt(((x\(X.COR_LM12 -x\)X.COR_LM2)^2 + (y\(Y.COR_LM12-y\)Y.COR_LM2)^2)) CPL.dig <- sqrt(((x\(X.COR_LM8 -x\)X.COR_LM6)^2 + (y\(Y.COR_LM8-y\)Y.COR_LM6)^2)) BD1.dig <- sqrt(((x\(X.COR_LM3 -x\)X.COR_LM10)^2 + (y\(Y.COR_LM3-y\)Y.COR_LM10)^2)) BD2.dig <- sqrt(((x\(X.COR_LM3 -x\)X.COR_LM11)^2 + (y\(Y.COR_LM3-y\)Y.COR_LM11)^2)) SnL.dig <- sqrt(((x\(X.COR_LM15 -x\)X.COR_LM1)^2 + (y\(Y.COR_LM15-y\)Y.COR_LM1)^2)) OL.dig <- sqrt(((x\(X.COR_LM16 -x\)X.COR_LM15)^2 + (y\(Y.COR_LM16-y\)Y.COR_LM15)^2))
library(tidyverse)
library(curl)
raw <- curl("https://raw.githubusercontent.com/allisondavis/morphology_analysis/refs/heads/master/comparison_data2.csv")
raw <- read.csv(raw, header = TRUE, sep = ",", stringsAsFactors = TRUE)
#AD20-084 is missing all manual values, so I will remove to avoid issues with NAs
raw<- raw[raw$ID !="AD20-082", ]
I will use Shapiro-wilk, histograms, and QQ plots to determine what traits are normal.
Will separate my raw dataset into the two factors of interest: digital and manual measurements.
library(dplyr)
digital <- raw %>%
select(ID, SPP, ends_with(".dig"))
manual <- raw %>%
select(ID, SPP, ends_with(".man"))
All but OL fail SW test with a slight skew/deviation to the right.
##### Shapiro-Wilk test #####
shapiro.test(digital$SL)
##
## Shapiro-Wilk normality test
##
## data: digital$SL
## W = 0.97473, p-value = 1.163e-05
shapiro.test(digital$BD1)
##
## Shapiro-Wilk normality test
##
## data: digital$BD1
## W = 0.98196, p-value = 0.0002964
shapiro.test(digital$BD2)
##
## Shapiro-Wilk normality test
##
## data: digital$BD2
## W = 0.9907, p-value = 0.03069
shapiro.test(digital$CPD)
##
## Shapiro-Wilk normality test
##
## data: digital$CPD
## W = 0.96472, p-value = 2.595e-07
shapiro.test(digital$CPL)
##
## Shapiro-Wilk normality test
##
## data: digital$CPL
## W = 0.95444, p-value = 9.411e-09
shapiro.test(digital$PreDL)
##
## Shapiro-Wilk normality test
##
## data: digital$PreDL
## W = 0.98153, p-value = 0.0002403
shapiro.test(digital$DbL)
##
## Shapiro-Wilk normality test
##
## data: digital$DbL
## W = 0.9864, p-value = 0.00282
shapiro.test(digital$HL)
##
## Shapiro-Wilk normality test
##
## data: digital$HL
## W = 0.95243, p-value = 5.179e-09
shapiro.test(digital$HD1)
##
## Shapiro-Wilk normality test
##
## data: digital$HD1
## W = 0.9765, p-value = 2.458e-05
shapiro.test(digital$HD2)
##
## Shapiro-Wilk normality test
##
## data: digital$HD2
## W = 0.97148, p-value = 3.137e-06
shapiro.test(digital$SnL)
##
## Shapiro-Wilk normality test
##
## data: digital$SnL
## W = 0.97218, p-value = 4.143e-06
shapiro.test(digital$OL)
##
## Shapiro-Wilk normality test
##
## data: digital$OL
## W = 0.99609, p-value = 0.5701
##### Histograms #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
hist(digital$SL, breaks=30)
hist(digital$BD1, breaks=30)
hist(digital$BD2, breaks=30)
hist(digital$CPD, breaks=30)
hist(digital$CPL, breaks=30)
hist(digital$PreDL, breaks=30)
hist(digital$DbL, breaks=30)
hist(digital$HL, breaks=30)
hist(digital$HD1, breaks=30)
hist(digital$HD2, breaks=30)
hist(digital$SnL, breaks=30)
hist(digital$OL, breaks=30)
##### QQ plots #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
qqnorm(digital$SL)
qqline(digital$SL)
qqnorm(digital$BD1)
qqline(digital$BD1)
qqnorm(digital$BD2)
qqline(digital$BD2)
qqnorm(digital$CPD)
qqline(digital$CPD)
qqnorm(digital$CPL)
qqline(digital$CPL)
qqnorm(digital$PreDL)
qqline(digital$PreDL)
qqnorm(digital$DbL)
qqline(digital$DbL)
qqnorm(digital$HL)
qqline(digital$HL)
qqnorm(digital$HD1)
qqline(digital$HD1)
qqnorm(digital$HD2)
qqline(digital$HD2)
qqnorm(digital$SnL)
qqline(digital$SnL)
qqnorm(digital$OL)
qqline(digital$OL)
All fail the SW test, with a slight skew/deviation to the right.
##### Shapiro-Wilk test #####
shapiro.test(manual$SL)
##
## Shapiro-Wilk normality test
##
## data: manual$SL
## W = 0.98036, p-value = 0.0001385
shapiro.test(manual$BD1)
##
## Shapiro-Wilk normality test
##
## data: manual$BD1
## W = 0.96494, p-value = 2.806e-07
shapiro.test(manual$CPD)
##
## Shapiro-Wilk normality test
##
## data: manual$CPD
## W = 0.97037, p-value = 2.046e-06
shapiro.test(manual$CPL)
##
## Shapiro-Wilk normality test
##
## data: manual$CPL
## W = 0.97427, p-value = 9.619e-06
shapiro.test(manual$PreDL)
##
## Shapiro-Wilk normality test
##
## data: manual$PreDL
## W = 0.98037, p-value = 0.0001391
shapiro.test(manual$DbL)
##
## Shapiro-Wilk normality test
##
## data: manual$DbL
## W = 0.98347, p-value = 0.00062
shapiro.test(manual$HL)
##
## Shapiro-Wilk normality test
##
## data: manual$HL
## W = 0.97032, p-value = 2.009e-06
shapiro.test(manual$HD1)
##
## Shapiro-Wilk normality test
##
## data: manual$HD1
## W = 0.98378, p-value = 0.0007249
shapiro.test(manual$SnL)
##
## Shapiro-Wilk normality test
##
## data: manual$SnL
## W = 0.97615, p-value = 2.113e-05
shapiro.test(manual$OL)
##
## Shapiro-Wilk normality test
##
## data: manual$OL
## W = 0.98828, p-value = 0.007818
##### Histograms #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
hist(manual$SL, breaks=30)
hist(manual$BD1, breaks=30)
hist(manual$CPD, breaks=30)
hist(manual$CPL, breaks=30)
hist(manual$PreDL, breaks=30)
hist(manual$DbL, breaks=30)
hist(manual$HL, breaks=30)
hist(manual$HD1, breaks=30)
hist(manual$SnL, breaks=30)
hist(manual$OL, breaks=30)
##### QQ plots #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
qqnorm(manual$SL)
qqline(manual$SL)
qqnorm(manual$BD1)
qqline(manual$BD1)
qqnorm(manual$CPD)
qqline(manual$CPD)
qqnorm(manual$CPL)
qqline(manual$CPL)
qqnorm(manual$PreDL)
qqline(manual$PreDL)
qqnorm(manual$DbL)
qqline(manual$DbL)
qqnorm(manual$HL)
qqline(manual$HL)
qqnorm(manual$HD1)
qqline(manual$HD1)
qqnorm(manual$SnL)
qqline(manual$SnL)
qqnorm(manual$OL)
qqline(manual$OL)
While some values are still significant, log transformations vastly improve normality (eg p=2e-16 to p=0.002). The histograms and QQ plots look pretty damn normal, so I will stick with log transformed values.
Log transformed data sets
dig_trans <- digital
dig_trans[, c(3:14)] <- log(dig_trans[, c(3:14)])
man_trans <- manual
man_trans[, c(3:12)] <- log(man_trans[, c(3:12)])
OL was the only one that was normal in raw check, but since it was not normal in manual, and I want to compare dig to man, I will transform the OL here too (don’t want to compare raw to transformed measurements).
##### Shapiro-Wilk test #####
shapiro.test(dig_trans$SL)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$SL
## W = 0.99403, p-value = 0.2053
shapiro.test(dig_trans$BD1)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$BD1
## W = 0.99381, p-value = 0.1811
shapiro.test(dig_trans$BD2)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$BD2
## W = 0.98978, p-value = 0.01812
shapiro.test(dig_trans$CPD)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$CPD
## W = 0.99187, p-value = 0.06013
shapiro.test(dig_trans$CPL)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$CPL
## W = 0.99116, p-value = 0.03993
shapiro.test(dig_trans$PreDL)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$PreDL
## W = 0.99247, p-value = 0.08521
shapiro.test(dig_trans$DbL)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$DbL
## W = 0.98426, p-value = 0.0009232
shapiro.test(dig_trans$HL)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$HL
## W = 0.9913, p-value = 0.04321
shapiro.test(dig_trans$HD1)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$HD1
## W = 0.99687, p-value = 0.7577
shapiro.test(dig_trans$HD2)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$HD2
## W = 0.99478, p-value = 0.3066
shapiro.test(dig_trans$SnL)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$SnL
## W = 0.99732, p-value = 0.8578
shapiro.test(dig_trans$OL)
##
## Shapiro-Wilk normality test
##
## data: dig_trans$OL
## W = 0.9899, p-value = 0.01938
##### Histograms #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
hist(dig_trans$SL, breaks=30)
hist(dig_trans$BD1, breaks=30)
hist(dig_trans$BD2, breaks=30)
hist(dig_trans$CPD, breaks=30)
hist(dig_trans$CPL, breaks=30)
hist(dig_trans$PreDL, breaks=30)
hist(dig_trans$DbL, breaks=30)
hist(dig_trans$HL, breaks=30)
hist(dig_trans$HD1, breaks=30)
hist(dig_trans$HD2, breaks=30)
hist(dig_trans$SnL, breaks=30)
hist(dig_trans$OL, breaks=30)
##### QQ plots #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
qqnorm(dig_trans$SL)
qqline(dig_trans$SL)
qqnorm(dig_trans$BD1)
qqline(dig_trans$BD1)
qqnorm(dig_trans$BD2)
qqline(dig_trans$BD2)
qqnorm(dig_trans$CPD)
qqline(dig_trans$CPD)
qqnorm(dig_trans$CPL)
qqline(dig_trans$CPL)
qqnorm(dig_trans$PreDL)
qqline(dig_trans$PreDL)
qqnorm(dig_trans$DbL)
qqline(dig_trans$DbL)
qqnorm(dig_trans$HL)
qqline(dig_trans$HL)
qqnorm(dig_trans$HD1)
qqline(dig_trans$HD1)
qqnorm(dig_trans$HD2)
qqline(dig_trans$HD2)
qqnorm(dig_trans$SnL)
qqline(dig_trans$SnL)
qqnorm(dig_trans$OL)
qqline(dig_trans$OL)
##### Shapiro-Wilk test #####
shapiro.test(man_trans$SL)
##
## Shapiro-Wilk normality test
##
## data: man_trans$SL
## W = 0.99424, p-value = 0.2297
shapiro.test(man_trans$BD1)
##
## Shapiro-Wilk normality test
##
## data: man_trans$BD1
## W = 0.99567, p-value = 0.4747
shapiro.test(man_trans$CPD)
##
## Shapiro-Wilk normality test
##
## data: man_trans$CPD
## W = 0.99598, p-value = 0.5428
shapiro.test(man_trans$CPL)
##
## Shapiro-Wilk normality test
##
## data: man_trans$CPL
## W = 0.98966, p-value = 0.01693
shapiro.test(man_trans$PreDL)
##
## Shapiro-Wilk normality test
##
## data: man_trans$PreDL
## W = 0.98692, p-value = 0.003722
shapiro.test(man_trans$DbL)
##
## Shapiro-Wilk normality test
##
## data: man_trans$DbL
## W = 0.98849, p-value = 0.008812
shapiro.test(man_trans$HL)
##
## Shapiro-Wilk normality test
##
## data: man_trans$HL
## W = 0.99081, p-value = 0.03262
shapiro.test(man_trans$HD1)
##
## Shapiro-Wilk normality test
##
## data: man_trans$HD1
## W = 0.99264, p-value = 0.09385
shapiro.test(man_trans$SnL)
##
## Shapiro-Wilk normality test
##
## data: man_trans$SnL
## W = 0.99033, p-value = 0.02488
shapiro.test(man_trans$OL)
##
## Shapiro-Wilk normality test
##
## data: man_trans$OL
## W = 0.97981, p-value = 0.0001072
##### Histograms #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
hist(man_trans$SL, breaks=30)
hist(man_trans$BD1, breaks=30)
hist(man_trans$CPD, breaks=30)
hist(man_trans$CPL, breaks=30)
hist(man_trans$PreDL, breaks=30)
hist(man_trans$DbL, breaks=30)
hist(man_trans$HL, breaks=30)
hist(man_trans$HD1, breaks=30)
hist(man_trans$SnL, breaks=30)
hist(man_trans$OL, breaks=30)
##### QQ plots #####
par(mfcol=c(2,2), oma = c(0,0,2,0))
qqnorm(man_trans$SL)
qqline(man_trans$SL)
qqnorm(man_trans$BD1)
qqline(man_trans$BD1)
qqnorm(man_trans$CPD)
qqline(man_trans$CPD)
qqnorm(man_trans$CPL)
qqline(man_trans$CPL)
qqnorm(man_trans$PreDL)
qqline(man_trans$PreDL)
qqnorm(man_trans$DbL)
qqline(man_trans$DbL)
qqnorm(man_trans$HL)
qqline(man_trans$HL)
qqnorm(man_trans$HD1)
qqline(man_trans$HD1)
qqnorm(man_trans$SnL)
qqline(man_trans$SnL)
qqnorm(man_trans$OL)
qqline(man_trans$OL)
I will perform 3 different analyses: Bland-Altman, MANOVA and PCA. These vary in whether they need long or wide formats. Also, the raw data contains columns that are not needed for these analyses (eg location or photo date) so I will remove those to leave just the species, tag ID, and measurements.
Additionally, since we have two measurements for BD and HD, we will create two datasets: one with BD1 and HD1, and another with BD2 and HD2 (this applies to digital only, as manual only has one measurement for each). Once we run BA plots, we can determine which of these measurements is best to continue analysis with.
Long format
library(tidyr)
#log transform entire data set
raw2 <- raw
raw2[, c(3:24)] <- log(raw2[, c(3:24)])
# this will create a data frame with species, ID, characteristic, method and value
df_long <- raw2 %>%
pivot_longer(
cols = ends_with(".dig") | ends_with(".man"),
names_to = c("characteristic", "method"),
names_sep = "\\.",
values_to = "value"
)
# method and characteristic are initially a character; need to be a factor
df_long$method <- as.factor(df_long$method)
df_long$characteristic <- as.factor(df_long$characteristic)
#create data frames that only have one measurement for BD and HD
## BD1 and HD1 for both digital and manual measurements
df_long2 <- df_long[-grep("BD2", df_long$characteristic),]
df_long2 <- df_long2[-grep("HD2", df_long2$characteristic),]#only contains BD1 & HD1
## BD1/HD1 for manual (since they only have one) and BD2/HD2 for digital
df_long3 <- df_long %>%
filter(
# Keep everything for Hand measurements
method == "man" |
# Keep all digital measurements except BD1 and HD1
!(method == "dig" & characteristic %in% c("BD1", "HD1"))
)
#to ensure comparisons between BD2 and BD1 in this new data frame, will combine characteristic names to just BD or HD
df_long3<- df_long3 %>%
mutate(
characteristic = case_when(
characteristic %in% c("BD1", "BD2") ~ "BD",
characteristic %in% c("HD1", "HD2") ~ "HD",
TRUE ~ characteristic # Leave other characteristics unchanged
)
)
Wide format
df_wide <- df_long2 %>%
pivot_wider(
names_from = method,
values_from = value
)
df_wide2 <- df_long3 %>%
pivot_wider(
names_from = method,
values_from = value
)
Used to assess agreement between two measurement methods.
We’re first interested in how BD1, HD1, BD2, and HD2 compare, so let’s isolate those.
data_BD1 <- df_wide %>%
filter(characteristic == "BD1")
data_BD1 <- data_BD1 %>%
mutate(
average = (dig + man) / 2,
difference = dig - man
)
data_HD1 <- df_wide %>%
filter(characteristic == "HD1")
data_HD1 <- data_HD1 %>%
mutate(
average = (dig + man) / 2,
difference = dig - man
)
data_BD2 <- df_wide2 %>%
filter(characteristic == "BD")
data_BD2 <- data_BD2 %>%
mutate(
average = (dig + man) / 2,
difference = dig - man
)
data_HD2 <- df_wide2 %>%
filter(characteristic == "HD")
data_HD2 <- data_HD2 %>%
mutate(
average = (dig + man) / 2,
difference = dig - man
)
ba_BD1 <- ggplot(data_BD1, aes(x = average, y = difference, color=SPP)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(data_BD1$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(data_BD1$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(data_BD1$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "BD1",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_HD1 <- ggplot(data_HD1, aes(x = average, y = difference, color=SPP)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(data_HD1$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(data_HD1$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(data_HD1$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "HD1",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_BD2 <- ggplot(data_BD2, aes(x = average, y = difference, color=SPP)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(data_BD2$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(data_BD2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(data_BD2$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "BD2",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_HD2 <- ggplot(data_HD2, aes(x = average, y = difference, color=SPP)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(data_HD2$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(data_HD2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(data_HD2$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "HD2",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
grid.arrange(ba_BD1, ba_BD2, ba_HD1, ba_HD2)
We see that BD1 has less scatter, less outside LOA and a smaller bias compared to BD2 (0.12 vs 0.36). In addition, BD2’s LOA don’t include 0, so BD1 is more appropriate to use. In contrast, HD1, while it has less scatter with fewer points outside the LOA, it has a bias further away from 0 compared to HD2 (which has slightly more scatter and more points outside LOA), so HD2 is more appropriate to use (0.06 vs -0.008).
I will therefore create a final dataset with BD1 and HD2 for all subsequent analyses.
We see major clusters on the BA plot for all the data. When colored by species, we see that this does not differentiate the clusters, so it is not a species difference we’re seeing. When colored by characteristic, we see this nicely corresponds to the clustering. This is due to differences in range of the measurements we used (eg standard length can range from 14-40mm but snout length ranges from 1-3mm).
df_ba_fin <- df_wide_fin %>%
mutate(
average = (dig + man) / 2,
difference = dig - man
)
ba_full_fin <- ggplot(df_ba_fin, aes(x = average, y = difference)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba_fin$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba_fin$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba_fin$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_full_fin
##### Species
ba_full_SPP_fin <- ggplot(df_ba_fin, aes(x = average, y = difference, color=SPP)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba_fin$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba_fin$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba_fin$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_full_SPP_fin
##### Characteristics
ba_full_CR_fin <- ggplot(df_ba_fin, aes(x = average, y = difference, color=characteristic)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba_fin$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba_fin$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba_fin$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
theme_minimal()
ba_full_CR_fin
ba_by_char_fin <- ggplot(df_ba_fin, aes(x = average, y = difference)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "solid", color = "black") +
geom_hline(yintercept = mean(df_ba_fin$difference, na.rm = TRUE), color = "red", linetype = "dashed") + # Mean line
geom_hline(yintercept = mean(df_ba_fin$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba_fin$difference, na.rm = TRUE),
color = "blue", linetype = "dashed") + # Limits of agreement
labs(title = "Bland-Altman Plots for All Characteristics",
x = "Average of Digital and Hand Measurements",
y = "Difference (Digital - Hand)") +
facet_wrap(~ characteristic, scales = "free") +
theme_minimal()
ba_by_char_fin
We can extract the important information (bias, LOAs, confidence intervals, and p-value after a one-sided t-test to see if bias is different from 0) using the blandr package. Contacted the author, and there may be a way to loop this, but for now will run on individual stats.
library(blandr)
## Warning: package 'blandr' was built under R version 4.4.2
### Whole dataset
(bar_full <- blandr.statistics(df_wide_fin$dig, df_wide_fin$man))
## Bland-Altman Statistics
## =======================
## t = 2.6493, df = 3389, p-value = 0.008104
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 3390
## Maximum value for average measures: 4.042436
## Minimum value for average measures: 0.3834636
## Maximum value for difference in measures: 0.6657775
## Minimum value for difference in measures: -0.9342704
##
## Bias: 0.006609047
## Standard deviation of bias: 0.145248
##
## Standard error of bias: 0.002494654
## Standard error for limits of agreement: 0.004263812
##
## Bias: 0.006609047
## Bias- upper 95% CI: 0.01150023
## Bias- lower 95% CI: 0.001717868
##
## Upper limit of agreement: 0.2912951
## Upper LOA- upper 95% CI: 0.299655
## Upper LOA- lower 95% CI: 0.2829352
##
## Lower limit of agreement: -0.278077
## Lower LOA- upper 95% CI: -0.2697171
## Lower LOA- lower 95% CI: -0.2864369
##
## =======================
## Derived measures:
## Mean of differences/means: -1.171216
## Point estimate of bias as proportion of lowest average: 1.723514
## Point estimate of bias as proportion of highest average 0.1634917
## Spread of data between lower and upper LoAs: 0.5693721
## Bias as proportion of LoA spread: 1.160761
##
## =======================
## Bias:
## 0.006609047 ( 0.001717868 to 0.01150023 )
## ULoA:
## 0.2912951 ( 0.2829352 to 0.299655 )
## LLoA:
## -0.278077 ( -0.2864369 to -0.2697171 )
### By characteristic
df_SL <- df_wide_fin %>%
filter(characteristic == "SL")
(bar_SL <- blandr.statistics(df_SL$dig, df_SL$man))
## Bland-Altman Statistics
## =======================
## t = 23.921, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 4.042436
## Minimum value for average measures: 2.960844
## Maximum value for difference in measures: 0.1785351
## Minimum value for difference in measures: -0.2129669
##
## Bias: 0.05387388
## Standard deviation of bias: 0.0414674
##
## Standard error of bias: 0.0022522
## Standard error for limits of agreement: 0.003852784
##
## Bias: 0.05387388
## Bias- upper 95% CI: 0.05830397
## Bias- lower 95% CI: 0.04944379
##
## Upper limit of agreement: 0.13515
## Upper LOA- upper 95% CI: 0.1427284
## Upper LOA- lower 95% CI: 0.1275715
##
## Lower limit of agreement: -0.02740223
## Lower LOA- upper 95% CI: -0.01982377
## Lower LOA- lower 95% CI: -0.03498068
##
## =======================
## Derived measures:
## Mean of differences/means: 1.517785
## Point estimate of bias as proportion of lowest average: 1.819545
## Point estimate of bias as proportion of highest average 1.332708
## Spread of data between lower and upper LoAs: 0.1625522
## Bias as proportion of LoA spread: 33.14251
##
## =======================
## Bias:
## 0.05387388 ( 0.04944379 to 0.05830397 )
## ULoA:
## 0.13515 ( 0.1275715 to 0.1427284 )
## LLoA:
## -0.02740223 ( -0.03498068 to -0.01982377 )
df_BD <- df_wide_fin %>%
filter(characteristic == "BD")
(bar_BD <- blandr.statistics(df_BD$dig, df_BD$man))
## Bland-Altman Statistics
## =======================
## t = 30.613, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 3.121966
## Minimum value for average measures: 1.748914
## Maximum value for difference in measures: 0.4951842
## Minimum value for difference in measures: -0.2418473
##
## Bias: 0.1207772
## Standard deviation of bias: 0.07264127
##
## Standard error of bias: 0.003945332
## Standard error for limits of agreement: 0.006749185
##
## Bias: 0.1207772
## Bias- upper 95% CI: 0.1285377
## Bias- lower 95% CI: 0.1130167
##
## Upper limit of agreement: 0.2631541
## Upper LOA- upper 95% CI: 0.2764298
## Upper LOA- lower 95% CI: 0.2498784
##
## Lower limit of agreement: -0.02159966
## Lower LOA- upper 95% CI: -0.008323962
## Lower LOA- lower 95% CI: -0.03487535
##
## =======================
## Derived measures:
## Mean of differences/means: 4.995503
## Point estimate of bias as proportion of lowest average: 6.905842
## Point estimate of bias as proportion of highest average 3.868628
## Spread of data between lower and upper LoAs: 0.2847538
## Bias as proportion of LoA spread: 42.41462
##
## =======================
## Bias:
## 0.1207772 ( 0.1130167 to 0.1285377 )
## ULoA:
## 0.2631541 ( 0.2498784 to 0.2764298 )
## LLoA:
## -0.02159966 ( -0.03487535 to -0.008323962 )
df_CPD <- df_wide_fin %>%
filter(characteristic == "CPD")
(bar_CPD <- blandr.statistics(df_CPD$dig, df_CPD$man))
## Bland-Altman Statistics
## =======================
## t = 27.399, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 2.471218
## Minimum value for average measures: 1.170403
## Maximum value for difference in measures: 0.5562302
## Minimum value for difference in measures: -0.1550393
##
## Bias: 0.1015749
## Standard deviation of bias: 0.06825759
##
## Standard error of bias: 0.003707244
## Standard error for limits of agreement: 0.006341892
##
## Bias: 0.1015749
## Bias- upper 95% CI: 0.1088671
## Bias- lower 95% CI: 0.09428274
##
## Upper limit of agreement: 0.2353598
## Upper LOA- upper 95% CI: 0.2478344
## Upper LOA- lower 95% CI: 0.2228853
##
## Lower limit of agreement: -0.03220997
## Lower LOA- upper 95% CI: -0.01973542
## Lower LOA- lower 95% CI: -0.04468452
##
## =======================
## Derived measures:
## Mean of differences/means: 5.715414
## Point estimate of bias as proportion of lowest average: 8.678624
## Point estimate of bias as proportion of highest average 4.110318
## Spread of data between lower and upper LoAs: 0.2675698
## Bias as proportion of LoA spread: 37.96203
##
## =======================
## Bias:
## 0.1015749 ( 0.09428274 to 0.1088671 )
## ULoA:
## 0.2353598 ( 0.2228853 to 0.2478344 )
## LLoA:
## -0.03220997 ( -0.04468452 to -0.01973542 )
df_CPL <- df_wide_fin %>%
filter(characteristic == "CPL")
(bar_CPL <- blandr.statistics(df_CPL$dig, df_CPL$man))
## Bland-Altman Statistics
## =======================
## t = 30.74, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 2.890711
## Minimum value for average measures: 1.76216
## Maximum value for difference in measures: 0.3986366
## Minimum value for difference in measures: -0.19231
##
## Bias: 0.1408985
## Standard deviation of bias: 0.08439277
##
## Standard error of bias: 0.004583586
## Standard error for limits of agreement: 0.00784103
##
## Bias: 0.1408985
## Bias- upper 95% CI: 0.1499145
## Bias- lower 95% CI: 0.1318826
##
## Upper limit of agreement: 0.3063083
## Upper LOA- upper 95% CI: 0.3217317
## Upper LOA- lower 95% CI: 0.290885
##
## Lower limit of agreement: -0.02451129
## Lower LOA- upper 95% CI: -0.00908793
## Lower LOA- lower 95% CI: -0.03993466
##
## =======================
## Derived measures:
## Mean of differences/means: 5.963591
## Point estimate of bias as proportion of lowest average: 7.995785
## Point estimate of bias as proportion of highest average 4.874182
## Spread of data between lower and upper LoAs: 0.3308196
## Bias as proportion of LoA spread: 42.59074
##
## =======================
## Bias:
## 0.1408985 ( 0.1318826 to 0.1499145 )
## ULoA:
## 0.3063083 ( 0.290885 to 0.3217317 )
## LLoA:
## -0.02451129 ( -0.03993466 to -0.00908793 )
df_PreDL <- df_wide_fin %>%
filter(characteristic == "PreDL")
(bar_PreDL <- blandr.statistics(df_PreDL$dig, df_PreDL$man))
## Bland-Altman Statistics
## =======================
## t = 5.1333, df = 338, p-value = 4.813e-07
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 3.442207
## Minimum value for average measures: 2.311603
## Maximum value for difference in measures: 0.6657775
## Minimum value for difference in measures: -0.3568127
##
## Bias: 0.01639195
## Standard deviation of bias: 0.05879367
##
## Standard error of bias: 0.003193234
## Standard error for limits of agreement: 0.005462588
##
## Bias: 0.01639195
## Bias- upper 95% CI: 0.02267307
## Bias- lower 95% CI: 0.01011084
##
## Upper limit of agreement: 0.1316276
## Upper LOA- upper 95% CI: 0.1423725
## Upper LOA- lower 95% CI: 0.1208826
##
## Lower limit of agreement: -0.09884364
## Lower LOA- upper 95% CI: -0.08809869
## Lower LOA- lower 95% CI: -0.1095886
##
## =======================
## Derived measures:
## Mean of differences/means: 0.5743082
## Point estimate of bias as proportion of lowest average: 0.7091164
## Point estimate of bias as proportion of highest average 0.4762048
## Spread of data between lower and upper LoAs: 0.2304712
## Bias as proportion of LoA spread: 7.112366
##
## =======================
## Bias:
## 0.01639195 ( 0.01011084 to 0.02267307 )
## ULoA:
## 0.1316276 ( 0.1208826 to 0.1423725 )
## LLoA:
## -0.09884364 ( -0.1095886 to -0.08809869 )
df_DbL <- df_wide_fin %>%
filter(characteristic == "DbL")
(bar_DbL <- blandr.statistics(df_DbL$dig, df_DbL$man))
## Bland-Altman Statistics
## =======================
## t = -2.844, df = 338, p-value = 0.004726
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 2.786619
## Minimum value for average measures: 1.117524
## Maximum value for difference in measures: 0.2886718
## Minimum value for difference in measures: -0.9342704
##
## Bias: -0.01723621
## Standard deviation of bias: 0.1115851
##
## Standard error of bias: 0.00606047
## Standard error for limits of agreement: 0.0103675
##
## Bias: -0.01723621
## Bias- upper 95% CI: -0.005315222
## Bias- lower 95% CI: -0.0291572
##
## Upper limit of agreement: 0.2014705
## Upper LOA- upper 95% CI: 0.2218635
## Upper LOA- lower 95% CI: 0.1810776
##
## Lower limit of agreement: -0.235943
## Lower LOA- upper 95% CI: -0.21555
## Lower LOA- lower 95% CI: -0.2563359
##
## =======================
## Derived measures:
## Mean of differences/means: -1.003657
## Point estimate of bias as proportion of lowest average: -1.542357
## Point estimate of bias as proportion of highest average -0.618535
## Spread of data between lower and upper LoAs: 0.4374135
## Bias as proportion of LoA spread: -3.940484
##
## =======================
## Bias:
## -0.01723621 ( -0.0291572 to -0.005315222 )
## ULoA:
## 0.2014705 ( 0.1810776 to 0.2218635 )
## LLoA:
## -0.235943 ( -0.2563359 to -0.21555 )
df_HL <- df_wide_fin %>%
filter(characteristic == "HL")
(bar_HL <- blandr.statistics(df_HL$dig, df_HL$man))
## Bland-Altman Statistics
## =======================
## t = -16.152, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 2.565974
## Minimum value for average measures: 1.480514
## Maximum value for difference in measures: 0.528404
## Minimum value for difference in measures: -0.7045543
##
## Bias: -0.1433862
## Standard deviation of bias: 0.1634505
##
## Standard error of bias: 0.008877412
## Standard error for limits of agreement: 0.01518638
##
## Bias: -0.1433862
## Bias- upper 95% CI: -0.1259242
## Bias- lower 95% CI: -0.1608481
##
## Upper limit of agreement: 0.1769768
## Upper LOA- upper 95% CI: 0.2068485
## Upper LOA- lower 95% CI: 0.1471051
##
## Lower limit of agreement: -0.4637492
## Lower LOA- upper 95% CI: -0.4338774
## Lower LOA- lower 95% CI: -0.4936209
##
## =======================
## Derived measures:
## Mean of differences/means: -7.262578
## Point estimate of bias as proportion of lowest average: -9.684893
## Point estimate of bias as proportion of highest average -5.587982
## Spread of data between lower and upper LoAs: 0.6407259
## Bias as proportion of LoA spread: -22.37871
##
## =======================
## Bias:
## -0.1433862 ( -0.1608481 to -0.1259242 )
## ULoA:
## 0.1769768 ( 0.1471051 to 0.2068485 )
## LLoA:
## -0.4637492 ( -0.4936209 to -0.4338774 )
df_HD <- df_wide_fin %>%
filter(characteristic == "HD")
(bar_HD <- blandr.statistics(df_HD$dig, df_HD$man))
## Bland-Altman Statistics
## =======================
## t = -1.37, df = 338, p-value = 0.1716
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 2.566481
## Minimum value for average measures: 1.391822
## Maximum value for difference in measures: 0.4478337
## Minimum value for difference in measures: -0.2905665
##
## Bias: -0.007854264
## Standard deviation of bias: 0.1055547
##
## Standard error of bias: 0.005732943
## Standard error for limits of agreement: 0.009807207
##
## Bias: -0.007854264
## Bias- upper 95% CI: 0.003422476
## Bias- lower 95% CI: -0.019131
##
## Upper limit of agreement: 0.1990329
## Upper LOA- upper 95% CI: 0.2183237
## Upper LOA- lower 95% CI: 0.179742
##
## Lower limit of agreement: -0.2147414
## Lower LOA- upper 95% CI: -0.1954506
## Lower LOA- lower 95% CI: -0.2340323
##
## =======================
## Derived measures:
## Mean of differences/means: -0.4189375
## Point estimate of bias as proportion of lowest average: -0.5643152
## Point estimate of bias as proportion of highest average -0.3060324
## Spread of data between lower and upper LoAs: 0.4137743
## Bias as proportion of LoA spread: -1.8982
##
## =======================
## Bias:
## -0.007854264 ( -0.019131 to 0.003422476 )
## ULoA:
## 0.1990329 ( 0.179742 to 0.2183237 )
## LLoA:
## -0.2147414 ( -0.2340323 to -0.1954506 )
df_SnL <- df_wide_fin %>%
filter(characteristic == "SnL")
(bar_SnL <- blandr.statistics(df_SnL$dig, df_SnL$man))
## Bland-Altman Statistics
## =======================
## t = -22.083, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 1.547597
## Minimum value for average measures: 0.3834636
## Maximum value for difference in measures: 0.2909172
## Minimum value for difference in measures: -0.7671152
##
## Bias: -0.1897899
## Standard deviation of bias: 0.1582401
##
## Standard error of bias: 0.008594423
## Standard error for limits of agreement: 0.01470227
##
## Bias: -0.1897899
## Bias- upper 95% CI: -0.1728846
## Bias- lower 95% CI: -0.2066952
##
## Upper limit of agreement: 0.1203607
## Upper LOA- upper 95% CI: 0.1492802
## Upper LOA- lower 95% CI: 0.09144123
##
## Lower limit of agreement: -0.4999405
## Lower LOA- upper 95% CI: -0.471021
## Lower LOA- lower 95% CI: -0.52886
##
## =======================
## Derived measures:
## Mean of differences/means: -21.11533
## Point estimate of bias as proportion of lowest average: -49.4936
## Point estimate of bias as proportion of highest average -12.26353
## Spread of data between lower and upper LoAs: 0.6203012
## Bias as proportion of LoA spread: -30.59641
##
## =======================
## Bias:
## -0.1897899 ( -0.2066952 to -0.1728846 )
## ULoA:
## 0.1203607 ( 0.09144123 to 0.1492802 )
## LLoA:
## -0.4999405 ( -0.52886 to -0.471021 )
df_OL <- df_wide_fin %>%
filter(characteristic == "OL")
(bar_OL <- blandr.statistics(df_OL$dig, df_OL$man))
## Bland-Altman Statistics
## =======================
## t = -1.7654, df = 338, p-value = 0.0784
## alternative hypothesis: true bias is not equal to 0
##
## =======================
## Number of comparisons: 339
## Maximum value for average measures: 1.471467
## Minimum value for average measures: 0.6198813
## Maximum value for difference in measures: 0.4116851
## Minimum value for difference in measures: -0.3320279
##
## Bias: -0.009159473
## Standard deviation of bias: 0.09552654
##
## Standard error of bias: 0.005188289
## Standard error for limits of agreement: 0.008875481
##
## Bias: -0.009159473
## Bias- upper 95% CI: 0.00104593
## Bias- lower 95% CI: -0.01936488
##
## Upper limit of agreement: 0.1780725
## Upper LOA- upper 95% CI: 0.1955307
## Upper LOA- lower 95% CI: 0.1606144
##
## Lower limit of agreement: -0.1963915
## Lower LOA- upper 95% CI: -0.1789333
## Lower LOA- lower 95% CI: -0.2138496
##
## =======================
## Derived measures:
## Mean of differences/means: -0.6782537
## Point estimate of bias as proportion of lowest average: -1.477617
## Point estimate of bias as proportion of highest average -0.6224722
## Spread of data between lower and upper LoAs: 0.374464
## Bias as proportion of LoA spread: -2.446022
##
## =======================
## Bias:
## -0.009159473 ( -0.01936488 to 0.00104593 )
## ULoA:
## 0.1780725 ( 0.1606144 to 0.1955307 )
## LLoA:
## -0.1963915 ( -0.2138496 to -0.1789333 )
While p-values for just about all of the characteristics are significant, we see that the bias itself is incredibly close to zero. The significant t-test may be a result of very little deviation, so any deviation is flagged as significant, but is not methodilogically relevent (ie there is not actual difference in methods).
We will run a repeated measures MANOVA on our data to corroborate our Bland-Altman results. We will follow up our MANOVA with an effect size calculation to see if any statistical significance is methodilogically relevent.
First, need a data frame with two rows per individual (one for digital and one for manual), plus 10 columns corresponding to each characteristic. Will create new long/wide formats since the ones above don’t have separate columns for the characteristics.
library(car)
raw3 <- raw2 %>%
rename(HD.dig = HD2.dig, HD.man = HD1.man)
raw3<- raw3[-5] #removes BD2
raw3<- raw3[-10] #removes HD1 for digital
df_long_ex <- raw3 %>%
pivot_longer(cols = -c(ID, SPP),
names_to = "Measurement",
values_to = "Value") %>%
mutate(Method = ifelse(str_detect(Measurement, ".dig"), "Digital", "Manual"),
Character = str_remove_all(Measurement, ".dig|.man")) %>%
select(ID, Method, Character, Value)
df_wide_ex <- df_long_ex %>%
pivot_wider(names_from = Character, values_from = Value)
Similar to our BA stats, we see a significant but not practical difference between methods. The MANOVA is statistically significant (p<0.001) but have a small effect size (0.0267). This means that method only explains 2.67% of the variance in our data.
manova_mod <- manova(cbind(SL, BD1, CPD, CPL, PreDL, DbL, HL, HD, SnL, OL) ~ Method + Error(ID/Method), data=df_wide_ex)
summary(manova_mod, test= "Pillai")
##
## Error: ID
## Df Pillai approx F num Df den Df Pr(>F)
## Residuals 338
##
## Error: ID:Method
## Df Pillai approx F num Df den Df Pr(>F)
## Method 1 0.90343 307.77 10 329 < 2.2e-16 ***
## Residuals 338
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# resutls from summary
pillai_method <- 0.90343
num_df_method <- 10
den_df_method <- 329
# Calculate partial eta-squared for Method
eta_squared_method <- pillai_method / (pillai_method + (den_df_method / num_df_method))
# Print the result
cat("Partial Eta-Squared (Method):", eta_squared_method, "\n")
## Partial Eta-Squared (Method): 0.02672599
In this analysis, I will compare the principle components after centering and scaling the data. A PCA analysis will help us determine which methodology influence the variation in our data the most without worrying about differences in scales/measurements.
In order to create ellipses around each method, we need to create a data frame where we have a method column.
Let’s create the data sets
library(stringr)
z.scores <- raw3
z.scores[, c(3:22)] <- scale(z.scores[, 3:22], center = TRUE, scale = TRUE)
z.long <- z.scores %>%
pivot_longer(cols = -c(ID, SPP),
names_to = "Measurement",
values_to = "Value") %>%
mutate(Method = ifelse(str_detect(Measurement, ".dig"), "Digital", "Manual"),
Character = str_remove_all(Measurement, ".dig|.man")) %>%
select(ID, Method, Character, Value)
z.wide <- z.long %>%
pivot_wider(names_from = Character, values_from = Value)
Now let’s run the individual PCAs
library(factoextra)
PCA_z <- prcomp(z.wide[, 3:12])
summary(PCA_z)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.8436 0.8130 0.62614 0.55854 0.46266 0.33455 0.26840
## Proportion of Variance 0.8098 0.0662 0.03926 0.03124 0.02144 0.01121 0.00721
## Cumulative Proportion 0.8098 0.8760 0.91527 0.94652 0.96795 0.97916 0.98638
## PC8 PC9 PC10
## Standard deviation 0.26010 0.23545 0.1137
## Proportion of Variance 0.00678 0.00555 0.0013
## Cumulative Proportion 0.99315 0.99870 1.0000
(loadings_z <- PCA_z$rotation[, 1:5])
## PC1 PC2 PC3 PC4 PC5
## SL -0.3445531 -0.114345286 -0.005710169 0.17154564 0.19310638
## BD1 -0.3316972 0.156767306 0.075155460 0.22928394 0.04281623
## CPD -0.3386258 0.053705753 0.069041230 0.14616615 0.12970500
## CPL -0.3296284 -0.169510427 0.005859031 0.20431984 0.38364527
## PreDL -0.3169342 -0.437246343 -0.116280333 0.19844760 0.16662862
## DbL -0.2576666 0.798347380 0.229640535 0.01923739 0.07996431
## HL -0.3043295 0.001361965 -0.484743514 -0.68113463 0.02637547
## HD -0.3333319 0.135633642 -0.223446752 -0.26299302 -0.01135247
## SnL -0.2814261 -0.288741515 0.771066560 -0.43214578 -0.23115811
## OL -0.3133228 -0.020449151 -0.209317400 0.30867235 -0.84169568
(z.eig.val <- get_eigenvalue(PCA_z))
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 8.08619177 80.9815360 80.98154
## Dim.2 0.66098035 6.6195814 87.60112
## Dim.3 0.39204658 3.9262653 91.52738
## Dim.4 0.31196561 3.1242710 94.65165
## Dim.5 0.21405644 2.1437310 96.79538
## Dim.6 0.11192433 1.1208990 97.91628
## Dim.7 0.07204117 0.7214774 98.63776
## Dim.8 0.06765296 0.6775304 99.31529
## Dim.9 0.05543776 0.5551977 99.87049
## Dim.10 0.01293197 0.1295110 100.00000
z.ind <- get_pca_ind(PCA_z)
z.wide <- cbind(z.wide, z.ind$coord[,1:3])
z.sorted.loadings.1 <- loadings_z[order(loadings_z[, 1]), 1]
z.myTitle <- "Loadings Plot for PC1"
z.myXlab <- "Variable Loadings"
dotchart(z.sorted.loadings.1, main=z.myTitle, xlab=z.myXlab, cex=1.5, col="red")
z.sorted.loadings.2 <- loadings_z[order(loadings_z[, 2]), 2]
z.myTitle <- "Loadings Plot for PC2"
z.myXlab <- "Variable Loadings"
dotchart(z.sorted.loadings.2, main=z.myTitle, xlab=z.myXlab, cex=1.5, col="red")
z.sorted.loadings.3 <- loadings_z[order(loadings_z[, 3]), 3]
z.myTitle <- "Loadings Plot for PC3"
z.myXlab <- "Variable Loadings"
dotchart(z.sorted.loadings.3, main=z.myTitle, xlab=z.myXlab, cex=1.5, col="red")
Now let’s do the comparisons
No significant differences.
zw.dig <- z.wide %>%
filter(Method == "Digital")
zw.man <- z.wide %>%
filter(Method == "Manual")
(pc1 <- t.test(zw.dig$Dim.1, zw.man$Dim.1, paired= TRUE, alternative = "two.sided"))
##
## Paired t-test
##
## data: zw.dig$Dim.1 and zw.man$Dim.1
## t = 1.0629e-14, df = 338, p-value = 1
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.07977954 0.07977954
## sample estimates:
## mean difference
## 4.310954e-16
(pc2 <- t.test(zw.dig$Dim.2, zw.man$Dim.2, paired= TRUE, alternative = "two.sided"))
##
## Paired t-test
##
## data: zw.dig$Dim.2 and zw.man$Dim.2
## t = -1.6152e-14, df = 338, p-value = 1
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.0429809 0.0429809
## sample estimates:
## mean difference
## -3.529389e-16
(pc3 <- t.test(zw.dig$Dim.3, zw.man$Dim.3, paired= TRUE, alternative = "two.sided"))
##
## Paired t-test
##
## data: zw.dig$Dim.3 and zw.man$Dim.3
## t = 1.3251e-14, df = 338, p-value = 1
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.08185152 0.08185152
## sample estimates:
## mean difference
## 5.514179e-16
No significant differences.
(pc1V <- leveneTest(Dim.1~Method, data=z.wide))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.0024 0.961
## 676
(pc1V <- leveneTest(Dim.2~Method, data=z.wide))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 1.7639 0.1846
## 676
(pc1V <- leveneTest(Dim.3~Method, data=z.wide))
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 4.107 0.0431 *
## 676
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
digVman <- autoplot(PCA_z, data = z.wide, colour="Method", loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5"))+ ggtitle("PCA Plot of Methods") + theme_classic()
digVman
digVman2 <- autoplot(PCA_z, data = z.wide, colour="Method", loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5"))+ ggtitle("PCA Plot of Methods") + theme_classic()
Our main goal with this R analysis was to compare the precision of manual and digital measurements.
We first use Bland-Altman plots which compares how closely two methods are to each other. We see that for each characteristic and for our combined data, the bias is close to zero (range: -0.1 to 0.1) with LOAs that include zero, indicating little to no difference between the two methods. Note that for biases greater than zero, this indicates digital measurements were on average larger, whereas biases less than zero indicate larger manual measurements. While the statistical tests for the Bland-Altman plots (one-sided t-test to see if bias is different than zero) indicate a significant difference between the methods, we feel that the biases and LOAs themselves indicate no practical difference. The significant results are likely due to small deviations in an otherwise consistent data set that will statistically indicate a difference, but these differences in reality are minuscule.
Since these characteristics are no independent of each other, a more robust statistical test (than the t-tests on individual characteristics) is a repeated measures MANOVA. This allowed us to compare the manual and digital methods while treating all the characteristics as related variables. Similar to our Bland-Altman results, we see a significant difference (p<0.001). However, when we calculate the effect size, we see a small effect size (0.0267). This again illustrates that while there is a statistical difference between the two methods, this difference is not practical. Only 2.67% of the variance in our data is explained by the difference in measurement method.
Lastly, to characterize the overall shape difference and compare variation among the methods, we performed a principal component analysis. Both paired t-tests and Levene’s tests on the PC scores showed no significant difference between the two methods, corroborating our interpretation of the practical results from our previous analyses.
Overall, we conclude there is no practical difference between manual and digital measurements.